Measuring School and 
Teacher Value Added for 
IMPACT and TEAM in 
DC Public Schools 

Final Report 

August 20, 201 0 

Eric Isenberg 
Heinrich Hock 



MPR Reference Numbers: 

06742.150 
06325.41 5 

Submitted to: 

DC Public Education Fund 
1 534 14th Street, NW 
Washington, DC 20005 
Project Officer: Cate Swinburn 

District of Columbia Public Schools 
1 200 First Street, NE 
Washington, DC 20002 
Project Officer: Hella Bel Hadj Amor 

New Leaders for New Schools 
30 West 26th Street 
New York, NY 10010 
Project Officer: Dianne Houghton 

Submitted by: 

Mathematica Policy Research 
600 Maryland Avenue, SW 
Suite 550 

Washington, DC 20024-251 2 
Telephone: (202) 484-9220 
Facsimile: (202) 863-1 763 

Project Director: Eric Isenberg (06742) 
Project Director: Duncan Chaplin (06325) 



Measuring School and 
Teacher Value Added for 
IMPACT and TEAM in 
DC Public Schools 

Final Report 

August 20, 2010 

Eric Isenberg 
Heinrich Hock 



MATHEMATICA 

Policy Research, Inc. 



Mathematica Policy Research 



ACKNOWLEDGMENTS 

We are grateful to the many people who contributed to this report. Special thanks go to Chris 
Mathews and Dianne Houghton at New Leaders for New Schools and Cate Swinburn of the 
District of Columbia Public Education Fund for their support of our work. We thank Hella Bel 
Hadj Amor, Jason Kamras, and Erin McGoldrick at the District of Columbia Public Schools for 
working together to build a value-added model that meets the needs of the District of Columbia 
Public Schools. We also thank Eric Hanushek and Tim Sass, the independent reviewers who made 
valuable suggestions for improvement. 

At Mathematica Policy Research, Mary Grider, assisted by Emma Ernst, Francesca Palik, and 
Jeremy Page, processed the data and provided expert programming. Duncan Chaplin and Steven 
Glazerman provided valuable comments. Amanda Bernhardt and Betty Teller edited the report, and 
Lisa Walls and Jackie McGee provided word processing and production support. 



Mathematica Policy Research 



CONTENTS 

I OVERVIEW 1 

A. Using Value-Added to Measure Performance 2 

B. A Value-Added Model for DCPS B 

C. Challenges and Solutions B 

D. Caveats 5 

II DATA 7 

A. DC CAS Test Scores 7 

B. Student Background Data 8 

C. School and Teacher Dosage 9 

1 . School Dosage 

2. Teacher Dosage 

3. Teacher Teams 1 

III THE VALUE-ADDED MODEL 1 2 

A. Estimation Equation 12 

B. Measurement Error in the Pretests 1 3 

C. Creating Total Grade-Specific Teacher Estimates 1 5 

D. Combining Estimates Across Grades 16 

E. Shrinkage Procedure 18 

F. Calculating School Scores That Combine Math and Reading Estimates ... 1 9 

REFERENCES 21 



IV 



O kO kO 



Mathematica Policy Research 



I. OVERVIEW 

The District of Columbia Public Schools (DCPS) has incorporated measures of school and 
teacher effectiveness, based on student test score growth, into a new teacher assessment system 
known as IMPACT. At the same time, New Leaders for New Schools (New Leaders) has been 
working with DCPS to offer financial awards to effective educators in the district. To support these 
efforts, both organizations asked Mathematica Policy Research to design a value-added model to 
measure school and teacher performance in the district. Mathematica developed these measures by 
adapting and tailoring methods used in our earlier work for New Leaders for other schools and 
districts (Booker and Isenberg 2008; Booker et al. 2008; Isenberg 2008; Potamites et al. 2009a; 
Potamites et al. 2009b). 1 

Implemented for the first time during the 2009—2010 school year, IMPACT is an assessment 
system with significant consequences. Prior to the start of the school year, DCPS categorized all 
staff into one of 20 groups. Everyone received an IMPACT score based on a point-based formula 
that was tailored to the job responsibilities of their group and availability of data. Individual value- 
added scores constituted half of the IMPACT score of “Group 1” teachers, who were regular 
education teachers in grades and subjects with sufficient student test score data. 2 Most of the 
remaining points for these teachers depended on a series of structured classroom observations. In 
addition, for almost all groups (including Group 1), school value-added scores counted for five 
percent of the IMPACT score. Based on their IMPACT score, teachers were placed into one of four 
performance categories. Teachers in the lowest category were subject to separation; those in the 
highest category were eligible for additional compensation. 

Results from the school value-added model will also be used by DCPS and New Leaders as part 
of the TEAM (Together Everyone Achieves More) program. TEAM is designed to encourage and 
identify effective leadership and teaching practices by providing financial awards to all staff — 
principals, teachers, and others — in schools that produce the largest gains in student achievement as 
measured by their value added. 

DCPS and New Leaders sought an objective, fair, and transparent value-added model to assess 
school and teacher effectiveness. Mathematica developed such a model in accord with these 
principles. It was reviewed by two independent value-added experts, Eric Hanushek of the Hoover 
Institution at Stanford University and Tim Sass of Florida State University. In the rest of this 
chapter, we describe the main features of the method in nontechnical terms. Chapter II presents the 
data used, and Chapter III focuses on the technical details of the statistical methods. 



1 This project has been funded by the DC Public Education Fund, DCPS, and New Leaders. 

2 DCPS plans to expand testing in future years so that more teachers will be covered by Group 1. For more details 
on IMPACT, see http://dcps.dc.gov/DCPS/In+the+Classroom/Ensuring+Teacher+Success/IMPACT+ 
(Performance+Assessment) . 



1 



I. Overview 



Mathematica Policy Research 



A. Using Value-Added to Measure Performance 

Many commonly used measures of school and teacher effectiveness provide an incomplete 
picture. In many districts, teachers are evaluated based on observations by the principal that provide 
a snapshot of performance but do not necessarily indicate how much students learn as a result of the 
teacher’s talent and efforts. Schools are often ranked by their students’ average test score or the 
percentage of students who meet state proficiency standards, measures that do not account for prior 
learning or other student characteristics. Although schools certainly affect students’ current test 
scores and proficiency levels, so too do the students’ prior education and nonschool factors like the 
influence of parents. An alternate measure of effectiveness would isolate how much a school or 
teacher contributes to student test score improvements apart from confounding factors outside the 
school’s or teacher’s control. 

To measure the performance of schools and teachers in DCPS, we used test scores and other 
data in a statistical model designed to capture the students’ test score growth attributable to a school 
or teacher compared to the progress the students would have made at the average school or with the 
average teacher. Known as a 'Value-added model” because it isolates the contribution of the school 
or teacher from other factors, this method has been used by a number of prominent researchers 
(Meyer 1997; Sanders 2000; McCaffrey et al. 2004; Raudenbush 2004; Hanushek et al. 2007) and is 
employed in measuring the performance of schools and/or teachers in many school districts, 
including Chicago, Dallas, Milwaukee, Minneapolis, and New York City. 

A value-added model measures teachers’ contributions to students’ achievement growth and 
typically accounts for the effect of student background characteristics on that growth. For example, 
suppose that a sixth-grade reading teacher has a class of students whose average score on the fifth- 
grade reading test, or "pretest,” was 3 points above the school district average. Further suppose that 
students with similar background characteristics (like poverty status or disability status) typically 
grow 2 points more than the district average. So, given their starting point, these students would 
ordinarily end the year 5 points above the school district average on the sixth-grade reading test, or 
"posttest.” The value-added model derives a relative measure of the teacher’s effectiveness by 
comparing the average student posttest score to this standard. In this example, if the class posttest 
average is exactly 5 points above average, the value-added model will identify the teacher as an 
average performer. If the class posttest average exceeds this standard, the value-added model will 
identify the teacher as above average, and if the average is less than the standard, the value-added 
model will identify the teacher as below average. Because a value-added model focuses on growth 
and accounts for students’ initial performance, it allows any schools or teachers to be identified as 
high performers, regardless of whether students were high-performing or low-performing at 
baseline. 

Value-added models provide a better measure of school or teacher effectiveness than alternate 
measures, such as those that rely on gains in the proportion of students achieving proficiency. Those 
gains measure growth only for students who cross the proficiency cut-point, while value-added 
models incorporate achievement gains for all students, regardless of their baseline achievement 
levels. In addition, unlike schoolwide proficiency rates, which are affected by changes in the 
composition of the student population, value-added models track individual students over time. 
Potamites and Chaplin (2008) used DCPS data from 2005 to 2007 to show that measures of school 
effectiveness based on proficiency gains were not highly correlated with measures based on value- 
added estimates. The low correlation was primarily due to changes in the composition of students 
from one year to the next. 



2 



I. Overview 



Mathematica Policy Research 



B. A Value-Added Model for DCPS 

We estimated the performance of DCPS schools and teachers using a value-added model based 
on District of Columbia Comprehensive Assessment System (DC CAS) tests in math and reading. 
We measured school and teacher effectiveness in these two subjects separately. Based in part on 
information gathered during focus groups it conducted with teachers, DCPS sought to limit the 
accountability for student performance to the time period after the creation of IMPACT. We 
therefore based value-added measures for schools and teachers on one year of test score growth, 
from the 2008-2009 school year to the 2009-2010 school year. 

School performance was based on as many grades as possible. Math and reading are tested in 
grades 3—8 and 10. Since third-grade students do not have a pretest, we estimated school 
performance only for grades 4—8 and 10. Schools that cover any of those grades were eligible to be 
included in the model. To avoid basing a school score on few students, which could lead to an 
imprecise measure of school performance, DCPS requested school value-added scores be reported 
only if there were at least 25 eligible students in the tested grades and subjects. Elementary and 
middle school students were included in the model if they had a posttest from 2010 and a pretest 
from the same subject in the previous grade in 2009. We excluded grade repeaters in these grades so 
that achievement growth for all students in a grade was based on the same posttest and pretest, 
allowing for meaningful comparisons between schools. However, because of this, not all students 
were included in the value-added model. The DC CAS test was not administered in grade 9, so we 
took the grade 10 pretest from grade 8. Most grade 10 students took the grade 8 DC CAS tests in 
spring 2008, but a sizable minority — 16 percent — took them in spring 2007. We therefore used tests 
with either a two- or three-year lag between pretest and posttest for grade 10 students. 

We calculated value-added estimates of teacher effectiveness separately from estimates of 
school effectiveness. We included regular education teachers who taught reading and/or math in 
grades 4—8 — subjects and grades with a posttest at the end of the year and a pretest the year before. 
Based on concerns about the precision of value-added estimates for teachers with few students, 
DCPS asked that we report estimates only for teachers with a minimum of 15 students during the 
2009—2010 school year. 

To avoid penalizing or rewarding schools or teachers for factors that were outside their control, 
we designed the value-added model to account for a set of student characteristics that could affect 
posttest scores. These included a student’s pretest scores in math and reading, poverty status, limited 
English proficiency, special education status, and gender. In the teacher model, we also accounted 
for student attendance during the prior year. In the school model, we accounted for whether grade 
10 students took the grade 8 test in 2007 or 2008 in case there might have been a systematic 
difference in test score growth between these two groups of students. Although a student’s 
race/ ethnicity may be correlated with factors that both affect test scores and are beyond a teacher’s 
control, DCPS chose not to account for this characteristic because preliminary results showed a high 
correlation in value-added measures regardless of whether race/ ethnicity was considered. 

C. Challenges and Solutions 

Although the basic concepts associated with using a value-added model to measure school or 
teacher performance are straightforward, complexities arise when applying the model to data. We 
discuss five challenges to estimating school or teacher effectiveness fairly, and outline our solutions. 



3 



I. Overview 



Mathematica Policy Kesearch 



(1) Student Mobility Across Schools. When students change schools mid-year, multiple 
schools are responsible for their academic growth. To credit a single school with complete 
responsibility for a student who changes schools, or to ignore that student entirely, would distort our 
measure of a school’s effectiveness. In DCPS in the 2009-2010 school year, four percent of the 
students were educated for part of the year at multiple schools. To account for this, we allocated 
proportional credit based on the fraction of time the student spent at each school, which can be 
thought of as the “dosage.” The analysis included students who moved between DCPS schools in a 
single year as well as those who spent part of the year outside DCPS, as long as they took the 
DC CAS during the prior year and current year. For the school value-added model, we measured the 
dosage for grade 10 students over two years since most students took the pretest two years earlier. 

(2) Co-Teaching. If two teachers co-taught students, it was not generally possible to 
distinguish the separate effects of each teacher on these students through statistical methods. In the 
2009-2010 school year, 20 percent of teachers taught students who were also educated in the same 
subject by another Group 1 teacher. Ten percent of teachers shared all their students; five percent 
shared between 10 and 99 percent of their students, and six percent shared more than zero and less 
than 10 percent of their students. 3 In some cases, two or more teachers were jointly responsible for a 
classroom of students at the same time. In other cases, groups of students were taught by one 
teacher for part of the year and another teacher for the remainder of the year. In these 
circumstances, we estimated the combined effectiveness of these teachers if they had seven or more 
students in common. Each teacher received the team score as their individual value-added score. For 
teachers who taught some students in teams and other students individually, their overall value- 
added score was the weighted average of individual and team measures, where the weights were 
proportional to the number of student-equivalents taught in each situation. 4 

(3) Small Samples of Students. Performance estimates for schools and teachers could be 
misleading if they were based on too few students. Some students may score high on a test due to 
good luck rather than good preparation, and others may score low due to bad luck. For schools or 
teachers with many students, good and bad luck that affects test performance will tend to cancel out. 
A school or teacher with few students, however, can receive a very high or very low effectiveness 
measure based primarily on luck (Kane and Staiger 2002). We reduced the possibility of such 
spurious results by (1) not reporting estimates for schools with fewer than 25 students or for 
teachers with fewer than 15 students and (2) using a statistical technique that combines the measure 
of teacher performance obtained from the data with a default assumption of average performance 
that we made in the absence of data (Morris 1983). For an individual teacher estimate, we relied 
more heavily on the default assumption of average effectiveness when we had the least amount of 
data — typically teachers with fewer students or students whose achievement growth was most 
difficult to predict with a statistical model. 



3 Percentages are given for math but are similar for reading. The totals for the three subgroups do not sum exactly 
to 20 percent due to rounding. 

4 The number of student-equivalents per teacher is based on the self-reported contact time a teacher spent with the 
students in his or her classes. A student who was enrolled in a school from the first day of class until the test date and 
was assigned to a teacher’s classroom for the full amount of classroom time devoted to a particular subject (math or 
reading) was counted as one student-equivalent. Fractional student-equivalents were possible if (1) a student changed 
schools during the year; (2) a student changed teachers during the year; or (3) a student was not assigned to a teacher for 
the full amount of classroom time devoted to the subject. For example, the student may have participated in a pullout 
program two days a week. 



4 




I. Overview 



Mathematica Policy Research 



(4) Measurement Error in the Test. Because a student’s performance on a single test is an 
imperfect measure of ability, schools or teachers may unfairly receive credit or blame for the initial 
performance of their students, rather than being assessed on the gains they have produced in student 
learning. For example, teachers of students with very high pretest scores may receive unfair 
measures of their performance if these test scores are attributable in part to luck; the average pretest 
score might have been lower if the students had been retested the next week. In such a case, the 
average ability level measured will be higher than their true ability level when the students enter the 
teacher’s classroom, so part of the learning growth that occurs that year would not be credited to 
this teacher. To compensate for this sort of measurement error in pretest scores, we employed a 
statistical technique (Kmenta 1997) that makes use of published information on the test/retest 
reliability of a given DC CAS test. 

(5) Comparing Value-Added Estimates Across Grades. The DC CAS is not specifically 
designed for users to compare gains across grades. Comparing value-added measures stated in terms 
of raw DC CAS points cannot meaningfully establish which teacher performed better if the teachers 
taught different grades. To compare teachers of different grades, we translated each teacher’s value- 
added estimate into a metric of “generalized” DC CAS points using a two-step procedure. First, we 
adjusted teachers’ value-added scores so that the average teacher in each grade received the same 
value-added score. Second, we multiplied each teacher’s score by a grade-specific conversion factor 
to ensure that the dispersion of teacher value-added scores by grade was similar. To compare 
schools with different grade configurations, we applied a similar strategy. We transformed each 
grade-level measure within a school into a measure stated in generalized DC CAS points and then 
averaged across grades to arrive at a composite value-added measure for the school. 

D. Caveats 

It is important to recognize the limitations of any performance measures, including those 
generated by a value-added model. Below, we discuss three caveats that are especially important for 
interpreting and using the results of a value-added model like the one we created for DCPS. 

(1) Estimation Error. The value-added measures are estimates of a school or teacher’s 
performance based on the available data and the value-added model used. As with any statistical 
model, there is uncertainty in the estimates produced, which implies that two teachers with similar 
value-added estimates are “statistically indistinguishable” from one another. We quantified the 
precision with which the measures were estimated by reporting the upper and lower bounds of a 
confidence interval of performance for each teacher. Similar to Schochet and Chiang (2010), this 
approach also allowed us to quantify the misclassification rate for teachers under various policy 
scenarios. For example, for teachers with the lowest possible IMPACT score in math — the bottom 
3.6 percent of DCPS teachers — one can say with at least 99.9 percent confidence that these teachers 
were below average in 2010. Similarly, a DCPS teacher with the lowest possible IMPACT score in 
reading — in the bottom 3.8 percent — was below average with at least 99.9 percent confidence. 

(2) Classroom Effects. Value-added estimates measure not only the effectiveness of the 
teacher but also the combined effect of all factors that affect student achievement in the classroom. 
This includes inputs from the school, including direct effects, like the impact the school’s physical 
plant has on test score growth, and indirect effects that work through teachers, such as the 
leadership abilities of the principal. Although a value-added model uses statistical techniques to 
account or “control” for differences in student performance based on documented sources of 
information about students, such as their prior-year test score or free lunch eligibility, the model 
cannot control for differences in student performance that arise from sources that are not explicitly 



5 



I. Overview 



Mathematica Policy Kesearch 



measured. Thus some caution should be applied when comparing teachers across schools 
(Aaronson, Barrow, and Sander 2007). 

(3) Unmeasured Differences Between Students. The implicit assumption of a value-added 
model is that if two classrooms contain students with identical documented characteristics, the 
students will not differ systematically in ways that affect test score growth but are not easily 
measured. For example, the students 5 level of motivation to succeed would be presumed to be the 
same in these two classrooms. If students were randomly assigned to teachers, they should not differ 
systematically on any characteristics. On the other hand, if the assignment of students to teachers 
was based on unobservable factors — for example, pairing difficult-to-teach students with teachers 
who have succeeded with similar students in the past — a value-added model might unfairly penalize 
these teachers because it cannot statistically account for factors that cannot be measured. 

There is debate among value-added researchers about how important this caveat is in practice 
(Kane and Staiger 2008; Rothstein 2009; Koedel and Betts 2009). Using data from the Los Angeles 
Unified School District, Kane and Staiger (2008) offer some evidence suggesting that unobservable 
student characteristics based on student assignment do not play a large role in determining value- 
added scores. They compared (a) the difference in value-added measures between pairs of teachers 
based on a typical situation in which principals assign students to teachers, and (b) the difference in 
student achievement between the teachers the following year, in which they taught classrooms that 
were formed by principals but then randomly assigned to the teachers. Kane and Staiger found that 
the differences between teachers 5 value-added scores before random assignment were a statistically 
significant predictor of achievement differences when classrooms were assigned randomly. These 
results were gathered in schools in which the principal was willing to allow random assignment of 
classrooms to teachers; it is not clear if they generalize to other contexts. 

Given these caveats, DCPS has chosen not to use value-added measures as the sole determinant 
of a teacher’s IMPACT score. 



6 




Mathematica Policy Research 



II. DATA 

In this chapter, we review the data used to generate the value-added measures. We discuss the 
standardized assessment used in DC and the data on student background characteristics. We then 
discuss how we calculated the amount of time that students with multiple schools or teachers spent 
with each school or teacher. This discussion includes an overview of the roster confirmation process 
that allowed teachers to confirm whether and for how long they taught students math and/or 
reading and a description of how we identified team-teaching situations. 

A. DC CAS Test Scores 

When estimating the effectiveness of schools, we included elementary and middle school 
students if they had a DC CAS test from 2010 (the posttest) and a DC CAS test from the previous 
grade in the same subject in 2009 (the pretest). Students in grade 10 were included if they had a 
pretest from grade 8 in the same subject in either 2007 or 2008. 5 Beginning with 16,124 students for 
whom we had posttest scores, we excluded students from the analysis file if there were missing or 
conflicting data. 6 Of this group, 8 students had conflicting duplicate 2010 test score records, 41 
students lacked corresponding information in the student background data, and 5 students had test 
scores that were outside the valid range of scores for the grade in which they were enrolled in 2010. 
The most common reason we excluded students was for lack of a pretest score, which could occur if 
they were not enrolled in a DC school during the testing period in April 2009 or if they missed the 
testing date. A total of 1,890 students, or 11.7 percent, were excluded for this reason. Finally, 
elementary and middle school students who repeated or skipped a grade were excluded so that 
achievement growth for all students in a grade was based on the same posttest and pretest. 7 This led 
to the exclusion of 197 students, or 1.6 percent of the remaining sample; at the school level, the 
percentage of students excluded for this reason ranged from zero to 8.1 percent. The resulting 
analysis file for math contained 13,983 students at 111 schools, an average of 126.0 students per 
school. 

To obtain the most accurate and precise estimates of teacher effectiveness, we estimated the 
value-added model for teachers using all students in grades 4—8 who were in the analysis file for 
school-level measures. This included some students who were not linked to a Group 1 teacher. We 
did not include grade 10 students in the teacher-level analysis because they lacked pretest data from 
the prior year. Of the remaining 12,121 students, 1,090, or 9.0 percent, were not linked to a Group 1 
DCPS teacher because they (1) were not linked to a DCPS school for at least 10 days, (2) were 
included in the roster file but not claimed by a teacher, or (3) were claimed only by a teacher with 
fewer than 7 students (we did not estimate a value-added measure for teachers with so few students). 
We reported estimates for teachers who taught 15 or more students in at least one subject. This 
included 480 teachers; of this group, 113 taught math only, 121 taught reading only, and 246 taught 



5 DCPS provided us with DC CAS test scores in math and reading from 2007 to 2009 and OSSE (Office of the 
State Superintendent of Education, which oversees DCPS and DC charter schools) provided data for 2010. 

6 Unless noted otherwise, sample sizes in this section are given for math. Sample sizes for reading were very 
similar. 

7 Students in grade 10 who either skipped a grade or repeated two grades between taking the grade 8 and grade 10 
test were counted among the total who lacked a pretest. 



7 



II. Data 



Mathematica Policy Research 



both subjects, for a total of 359 teachers of math and 367 teachers of reading. In both subjects, 
teachers averaged 31.8 students. 

For each subject, the DC CAS is scored so that each student receives a scale score from 300 to 
399 for third-grade students, 400 to 499 for fourth-grade students, and so on. The range for 10th- 
grade students is 900 to 999. The first digit is a grade indicator only; it does not reflect the student’s 
ability. The rest of the score, which ranges from 0 to 99, can only be meaningfully compared within 
grades and within subjects; math scores, for example, are generally more dispersed than reading 
scores within the same grade. To address this issue, before using the test scores in the value-added 
model, we created subject- and grade-specific z-scores by subtracting the mean and dividing by the 
standard deviation within a subject-grade combination. 8 This step allowed us to translate math and 
reading scores in every grade and subject into a common metric. To create a measure with a range 
resembling the original DC CAS point metric, we then multiplied each test score by the average 
standard deviation across all grades within each subject and year. 

B. Student Background Data 

We used data provided by DCPS to construct variables that were used as controls in the value- 
added models for student background characteristics. In both the school and teacher value-added 
models, we controlled for the following: 

• Pretest in same subject as posttest 

• Pretest in other subject (so we controlled for math and reading pretests regardless of 
posttest) 

• Gender 

• Free lunch eligibility 

• Reduced-price lunch eligibility 

• Limited English proficiency status 

• Having a specific learning disability 

• Having other types of disabilities requiring special education 

In the school model, we also controlled for: 

• Taking the grade 8 DC CAS test in 2007 rather than in 2008 (for some grade 10 
students) 

In the teacher model, we also controlled for: 

• Proportion of days that the student attended school during the prior year 



8 Subtracting the mean score for each subject and grade creates a score that has a mean zero of zero in all subject- 
grade combinations, effectively removing the uninformative first digit. 



II. Data 



Mathematica Policy Research 



The last variable measures student motivation. We used prior- rather than current-year attendance to 
avoid confounding student attendance with current-year teacher quality, as a good teacher might be 
expected to motivate students to attend more regularly than a weaker teacher. Attendance is a 
continuous variable that could range from zero to one. Aside from pretest variables, the other 
variables are indicator variables taking the value zero or one. 

The selection of these variables was based on data availability and careful judgment. For 
example, there were multiple categories of special education available in the administrative data, 
including information on students who received special test accommodations in one year but not in 
another. The choice of two categories for special education reflected a trade-off between a detailed 
specification, which allows for differences among different types of special education students, and a 
parsimonious specification, which avoids the problem of generating estimates that may be sensitive 
to outliers in the data. 

C. School and Teacher Dosage 

Because some students moved between schools or were taught by a combination of teachers, 
we apportioned their achievement growth among multiple schools or teachers. We refer to the 
fraction of time the student was enrolled with each teacher and at each school as the dosage. 

1. School Dosage 

Based on DCPS administrative data, which contain dates of school withdrawal and admission, 
we assigned every student a dosage for each school the student attended. School dosage equals the 
fraction of the school year that the student was officially enrolled at that school. Since students do 
not take tests on the last day of each school year, this measure covered the first three terms (that is, 
the fall semester and the first half of the spring semester); the third term ended eight school days 
before the beginning of testing. To fully account for 100 percent of each student’s time during the 
first three terms, we also recorded the portion of the school year the student was enrolled in schools 
outside DCPS. 

Because a school is unlikely to have an appreciable educational impact on a short-term student, 
we set dosage equal to zero for students who spent less than two weeks at a school. Conversely, we 
set it to 100 percent for those who spent all but two weeks at a school. Apart from this, in the 
school model we assumed that learning accumulated at a constant rate and treated days spent at one 
school as interchangeable with days spent at another. For example, if a student split time equally 
between two schools, each school was assigned a dosage of 50 percent for this student, regardless of 
which school the student attended first. Since the grade 8 DC CAS test served as the pretest for 
students in grade 10, we based dosage variables for grade 10 students on the schools they attended 
during the 2008-2009 and 2009-2010 school years (regardless of whether they had taken the grade 8 
DC CAS two or three years earlier). 

2. Teacher Dosage 

To determine which students were taught math and reading by a given teacher during the 
2009—2010 school year, DCPS conducted roster confirmation in March 2010, covering teachers of 
math and reading in grades 4—8. Teachers were provided with a list of students who appeared on 
their course rosters at some point during the year. For each of the first three terms, teachers 
indicated whether they taught each subject to each student, and if so, the proportion of time they 
taught the student relative to the full amount of time the teacher spent on that subject for the typical 



9 



II. Data 



Mathematica Volley Kesearch 



student. For example, if a student spent two and a half days per week in a Group 1 teacher’s 
classroom learning math and two and a half days per week in another classroom with a special 
education teacher while other students were learning math with the Group 1 teacher, then this 
student spent 0.5 of the instructional time with the Group 1 teacher. In recording the proportion of 
time spent with a student, teachers rounded to the nearest quarter, so 0 percent, 25 percent, 50 
percent, 75 percent, and 100 percent were the possible responses. For students who spent less than 
100 percent of the time with a teacher, teachers did not indicate the name of the other teacher. Staff 
in the DCPS central office followed up with teachers who had many unclaimed students on their 
roster and in other anomalous cases. 

We used the confirmed class rosters to construct teacher-student links. If the roster 
confirmation data indicated that a student had one math or reading teacher at a school, the teacher- 
student weight equaled the school dosage. If a student changed teachers from one term to another, 
we used the school calendar to determine the number of days the student spent with each teacher, 
and we subdivided the school dosage among teachers accordingly. When teachers claimed the same 
students during the same term, as in a team-teaching situation, DCPS decided to assign each teacher 
full credit for the shared students, reflecting DCPS’ preference to weight students equally, whether 
they were taught individually or by co-teachers. We therefore did not subdivide dosage for students 
of team teachers. Finally, similar to tracking time spent at all schools outside DCPS, we tracked the 
time a student spent with any teachers who were not recorded in the confirmed class rosters, which 
we called the “non-Group 1 teacher (s).” 

3. Teacher Teams 

We created variables for teacher teams to model two situations: (1) “co -teaching,” in which 
students received instruction from more than one Group 1 teacher for the same subject during the 
same term or (2) “sequential teaching,” in which a group of students switched from one teacher 
during one term to another teacher for another term. We formed teams only within schools and 
within grades. A teacher who taught more than one grade could therefore have multiple individual 
and/or team estimates. 

To prevent the estimates of teachers who shared students with unidentified teachers from 
becoming “contaminated” with the estimate of the catchall non-Group 1 teacher(s), we formed 
teams between a Group 1 teacher and the “non-Group 1 teacher(s).” Otherwise, because the 
estimate for the non-Group 1 teacher(s) was typically negative, the statistical model might 
compensate by attributing especially effective teaching to a Group 1 teacher who shared students to 
balance the negative estimate attributed to the non-Group 1 teacher (s). 

Estimating value-added measures for individuals or teams based on too few students 
unacceptably increases the risk of introducing imprecise or biased estimates. Therefore, we required 
that an individual teacher or teacher team have at least seven students. We assigned students of 
individual teachers with fewer than seven total students to the catchall category of non-Group 1 
teacher (s). Students in teams with fewer than seven students were assigned to the individual 
teachers. 

As an example of how our method worked, consider a teacher who taught a classroom of 24 
fourth-grade students, where 8 of the students participated in a pullout program with a special 
education teacher for half of the regular reading time. This teacher would receive two estimates for 
reading. For 16 students, there would be an estimate of the teacher’s individual effect; for the other 
8 students, there would be an estimate of the joint effect of the Group 1 teacher and the special 



10 




II. Data 



Mathematica Volley Kesearch 



education teacher. Because the teacher would receive 0.5 dosage for each of these half-time students, 
the total number of student-equivalents from this group for the teacher would be 0.5x8 = 4. The 
teacher’s overall value-added estimate would be a student-equivalent weighted average of the two 
estimates, with the weights equal to 4/ (4 + 16) = 1/5 for the team and 16/(4 + 16) = 4/5 for the 
individual effect. 



11 




Mathematica Policy Research 



III. THE VALUE-ADDED MODEL 
A. Estimation Equation 

We developed a value-added model that we used to measure four outcomes separately: school 
effectiveness in math, school effectiveness in reading, teacher effectiveness in math, and teacher 
effectiveness in reading. After assembling the analysis file, the first step for each outcome was to 
estimate a linear regression that combined students of all grade levels in the data. For school 
outcomes, we estimated an equation in which the posttest score depends on prior achievement, 
student background characteristics, schools attended, and a set of unmeasured factors. The equation 
can be expressed formally as: 

CO Y i g = \g\g- 1) + ®ig z /( g -i) + a ! x < + P' s < g + s ug ’ 

where Y ig is the standardized posttest score for student i in grade g and Y-^y is the standardized 
same-subject pretest for student i in grade g - 1 during the prior year. The variable Z i(gX) denotes the 
pretest in the opposite subject. Thus, when estimating school effectiveness in math, Y represents 
math tests with Z representing reading tests, and vice-versa. The pretest scores captured prior inputs 
into student achievement, and the associated coefficients, X\ g and CO \ g , varied by grade. The vector 
X- denotes the control variables for the individual student background characteristics listed in 
Chapter II. The coefficients on these characteristics, ai, were constrained to be the same across all 
grades. 9 

S- a is a vector of school dosage variables containing one variable for each school-grade 
combination. The measures of school effectiveness are school-grade effects contained in P, the 
coefficients of the dosage variables represented by S- 0 . For each grade, we included a measure of the 
combined effectiveness of all schools students attended that were outside of DCPS. The dosage for 
a given element of was set equal to the percentage of the year student i was enrolled in grade g at 
that school. The value of any element of was zero if student i was not taught in grade g in that 
school during the school year. Because S /V accounted for student attendance throughout the school 
year, its elements always summed to one. Rather than dropping one of the school dosage variables 
from the regression, we estimated the model without a constant term. We also mean centered the 
control variables so that each element of P represented a school- and grade-specific intercept term 
for the average student. 10 We assumed that the error term, \ g , is heteroskedastic. 



9 We estimated a common, grade-invariant set of coefficients of student background characteristics because a 
preliminary investigation revealed substantial differences in sign and magnitude of grade -specific coefficients on these 
covariates. These cross-grade differences appeared to reflect small within-grade samples of individuals with certain 
characteristics rather than true differences in the association between student characteristics and achievement growth. 
Estimating a common set of coefficients across grades allowed us to estimate the association between achievement and 
student characteristics using information from all grades, which smoothed out the implausibly large between-grade 
differences in these coefficients. 

10 Mean centering the student characteristics and pretest scores tends to reduce the estimated standard errors of the 
school effects (Wooldridge 2008). 



12 



